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DETAILED ACTION 

Response to Amendment 

1 . In response to the office action from 7/25/2006, the applicant has submitted an 
amendment, filed 10/4/2006, amending claims 1, 10, 27, and 34, while arguing to traverse the art 
rejection based on the limitations of claims 1 and 27 (Amendment, Pages 12-13). Applicant's 
arguments have been fully considered, however the previous rejection is maintained due to the 
reasons listed below in the response to arguments. 

2. Due to the amendment of Claims 10 and 34, the examiner has withdrawn the previous 
claim objection directed towards minor informalities. 

Response to Arguments 

3. Applicant's arguments have been fully considered but they are not persuasive for the 
following reasons: 

With respect to Claims 1 and 27, the applicant argues that: 

(a.) Charlesworth et al (U.S. Patent: 6,990,448) fails to disclose normalized text data 
being associated with a voice or typed signal to generate a "sounds-like" pair, updating a lexicon 
with a "sounds-like" pair, tagging a data file using a "sounds-like pair", and displaying 
normalized text in a voice tag editor (Amendment, Pages 12-13). 
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(b.) Baker et al (U.S. Patent: 6,092,044) fails to disclose a M sounds-like" pair including 
normalized text and alphanumeric characters, a text parser operable to generate normalized text 
corresponding to alphanumeric characters, and a voice tag editor that displays normalized text 
(Amendment, Page 13). 

In response to (a) and (b) (i.e., applicant's arguments against the references individually), 
the examiner notes that one cannot show nonobviousness by attacking references individually 
where the rejections are based on combinations of references. See In re Keller, 642 F.2d 413, 
208 USPQ 871 (CCPA 1981); In re Merck & Co., 800 F.2d 1091, 231 USPQ 375 (Fed. Cir. 
1986). 

Also, in response to (a) and (b), the examiner points out that the applicant's arguments 
fail to comply with 37 CFR 1 . 1 1 1(b) because they amount to a general allegation that the claims 
define a patentable invention without specifically pointing out how the language of the claims 
patentably distinguishes them from the references. 

In response to the applicant's above noted arguments, the examiner additionally notes that 
it is not the prior art references individually that teach the aforementioned limitations, but the 
combination of the teachings of Charlesworth and Baker. As noted in the prior Office Action 
(Page 2), Charlesworth discloses a voice tagging system that allows a user to annotate 
image/video data using a voice tag that is in the form of a "sounds-like" pair (text annotation 
(alphanumeric characters) and a normalized phoneme sequence representative of a particular 
word pronunciation (i.e., normalized text that serves as recognition text), Col. 9, Line 61- Col 
10, Line 30). Charlesworth, however, does not disclose a means for generating the voice tags 
according to the system/method presented in the claimed invention. 
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The teachings of Baker resolve the above noted deficiencies with respect to 
Charlesworth. Baker discloses an editor for receiving alphanumeric characters indicative of a 
word, displaying the characters, and editing the characters, which corresponds to the 
editor/receiving means/step of the presently claimed invention (prior Office Action, Page 3; 
editor display, Fig. 17; and Col 17, Line 66- Col 18, Line 6). Baker further discloses a text 
parser that segments an input word into a phoneme sequence for speech recognition (i.e., 
normalized text that serves as recognition text), which corresponds to the parsing means/step of 
the presently claimed invention (prior Office Action, Page 3; Col 15, Line 56- Col 16, Line 5; 
and editor display of a normalized phoneme sequence, Fig. 17, Element 1756). Finally, Baker 
discloses a dictionary that associates and stores a word and phoneme sequence in a pronunciation 
or "sounds-like" pair, which corresponds to the storage step/means of the presently claimed 
invention (prior Office Action, Page 3; Col. 15, Line 56- Col. 16, Line 5). 

Thus, since Charlesworth discloses a method for annotating image data utilizing voice 
tags comprising a text annotation/phoneme sequence pair and Baker recites a system for 
generating such pairs utilizing an editor, parser, and dictionary storage for the benefit of 
personalizing words in a speech recognition vocabulary by representing how a word is spoken by 
a particular speaker (Baker, Col 1, Lines 9-21 and 50-57), Claims 1 and 27 remain rejected. 

The dependent claims are argued as further limiting rejected independent claims 
(Amendment, Pages 13-14), and thus, these claims also remain rejected. 
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Claim Objections 

4. Claim 39 is objected to because of the following informalities: 
In line 2, "it" should be changed to -if--. 

Appropriate correction is required. 

Claim Rejections - 35 USC § 103 

5. The following is a quotation of 35 U.S.C 103(a) which forms the basis for all 
obviousness rejections set forth in this Office action: 

(a) A patent may not be obtained though the invention is not identically disclosed or described as set forth in 
section 102 of this title, if the differences between the subject matter sought to be patented and the prior art are 
such that the subject matter as a whole, would have been obvious at the time the invention was made to a person 
having ordinary skill in the art to which said subject matter pertains. Patentability shall not be negatived by the 
manner in which the invention was made. 

6. Claims 1-4, 10-11, 27-28, 30, and 34-35 are rejected under 35 U.S.C. 103(a) as being 
unpatentable over Charlesworth et al (U.S. Patent: 6,990,448) in view of Baker et al (U.S. 
Patent: 6,092,044). 

With respect to Claims 1 and 27, Charlesworth discloses a voice annotation (tag) system, 
that allows a user to generate text annotations corresponding to a media file through speech-to- 
text conversion, wherein the annotations (tags) comprise text (alphanumeric characters 
indicative of a voice tag) and an associated phoneme string (normalized text that serves as 
recognition text during retrieval) (Col. 9, Line 61- Col 10, Line 30). 
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Charlesworth does not teach how a voice recognizer utilized in voice tag generation is 
trained to identify spoken words corresponding to voice tags, specifically using the means noted 
in claims 1 and 27, however Baker discloses: 

An editor receptive of alphanumeric characters indicative of a word, the editor configured 
to display and edit the alphanumeric characters (word editor accepting typed and speech- 
generated text, Fig. 17, Col 17, Line 66- Col 18, Line 6) ; 

A text parser connected to the editor and operable to generate normalized text 
corresponding to the alphanumeric characters, such that the normalized text serves as recognition 
text for the word and is displayed by the editor (segmenting a recognized word into a phoneme 
sequence, Col 15, Line 56- Col 16, Line 5; and editor display, Fig. 17) ; and 

A storage mechanism connected to the editor and operable to associate the displayed 
alphanumeric characters with the corresponding normalized text, thereby developing a "sounds 
like" pair and to update a lexicon with the "sounds like" pair (dictionary storage of word and 
phonetic spelling (pronunciation) pairs, Col 16, Lines 1-5). 

Charlesworth and Baker are analogous art because they are from a similar field of 
endeavor in word recognition. Thus, it would have been obvious to a person of ordinary skill in 
the art, at the time of invention, to modify the teachings of Charlesworth with the speech 
recognition word editor taught by Baker in order to allow a user to add and personalize words 
(corresponding to voice tags in the case of Charlesworth) in a speech recognition vocabulary by 
representing how a word is spoken by a particular speaker (Baker, Col 1, Lines 9-21 and 50-57). 

With respect to Claim 2, Baker further recites: 
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The alphanumeric characters are typed in via a keyboard connected to the editor 
(keyboard, Fig. 1, Element 115). 

Also, Charlesworth discloses the use of a keyboard for entering a voice annotation (Col. 
10, Lines 36- 53). 

With respect to Claims 3 and 28, Charlesworth recites the voice annotations as applied 
to claim 1, while Baker further discloses: 

The word editor is connected to the lexicon (editor connected to a dictionary for word 
addition, Col 15, Line 56- Col 16, Line 6) and further configured to display a list of words 
residing in the lexicon (displaying word history, Col. 18, Lines 52-58). 

With respect to Claim 4, Charlesworth discloses a phoneme symbol sequence 
(normalized text) used by a speech recognizer in a voice annotation (tagging) system (Col. 9, 
Line 61- Col 10, Line 30), while Baker recites the dictionary containing phoneme sequence data 
for use by a speech recognizer (Fig. 2, Elements 230 and 245) as applied to Claim 1. 
With respect to Claims 10 and 34, Baker further discloses: 

The editor is configured to modify existing word "sounds like" pairs stored in the lexicon 
(editing a word history using an editor, Col. 18, Lines 42-58). 

With respect to Claims 11 and 35, Baker further recites: 

The editor is configured to modify a phonetic transcription used by the speech recognizer 
(editing a word pronunciation comprising a phoneme sequence transcription, Col. 17, Lines 66- 
Col 18, Line 6; and Col 18, Lines 42-58). 

With respect to Claim 30, Baker teaches the phoneme sequence utilized in speech 
recognition as applied to Claim 27. 
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7. Claims 5 and 31 are rejected under 35 U.S.C. 103(a) as being unpatentable over 
Charlesworth et al in view of Baker et al, and further in view of Hashimoto et al (U.S. Patent: 
5,632,002). 

With respect to Claims 5 and 31, Charlesworth in view of Baker discloses the voice 
annotation dictionary editor display as applied to Claims 1 and 27. Although Charlesworth 
discloses the use of topic-based dictionaries (Col 6, Lines 15-22), Charlesworth in view of Baker 
does not specifically suggest displaying a dictionary topic, however Hashimoto recites a means 
for displaying speech recognition dictionary names indicative of dictionary content (Col. 68, 
Lines 50-65). 

Charlesworth, Baker, and Hashimoto are analogous art because they are from a similar 
field of endeavor in speech recognition. Thus, it would have been obvious to a person of 
ordinary skill in the art, at the time of invention, to modify the teachings of Charlesworth in view 
of Baker with the dictionary content display means taught by Hashimoto in order to allow a user 
to easily access various dictionary contents for editing operations that can reduce memory 
requirements (Hashimoto, Col 68, Lines 31-42). 

8. Claims 6-7, 9, 17-18, 32, and 41 are rejected under 35 U.S.C. 103(a) as being 
unpatentable over Charlesworth et al in view of Baker et al, and further in view of Morrison 
(U.S. Patent: 5,425,128). 

With respect to Claims 6 and 32, Charlesworth in view of Baker discloses the voice 
annotation dictionary editor as applied to Claims 1 and 27. Although Charlesworth discloses that 
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dictionary words may be imported from an input file (Col. 18, Lines 42-58), Charlesworth in 
view of Baker does not specifically suggest importing a lexicon from an external data source, 
however Morrison discloses importing a vocabulary for a particular speech recognition 
application from an external host computer (Col 5, Lines 25-60). 

Charlesworth, Baker, and Morrison are analogous art because they are from a similar 
field of endeavor in speech recognition. Thus, it would have been obvious to a person of 
ordinary skill in the art, at the time of invention, to modify the teachings of Charlesworth in view 
of Baker with the vocabulary importing means disclosed by Morrison in order to implement the 
ability to process speech from a particular speaker at multiple computer systems without the need 
for portable media (Morrison, Col 2, Lines 30-51). 

With respect to Claim 7, Charlesworth recites the voice annotation system as applied to 
claim 1, while Morrison further recites: 

The external data source receives a request from a client computer (vocabulary request 
based on a particular application, Col 5, Lines 25-60). 

With respect to Claim 9, Morrison further recites: 

The request includes content requirements for a lexicon (vocabulary request based on a 
particular application, Col 5, Lines 25-60). 

With respect to Claims 17-18 and 41, Morrison further recites: 

Uploading the lexicon including a content description to a remote location (uploading a 
speech recognition vocabulary, associated with a particular application (content description), to 
a host computer Col 9, Lines 46-48). 
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9. Claims 8 and 33 are rejected under 35 U.S.C. 103(a) as being unpatentable over 
Charlesworth et al in view of Baker et al, further in view of Morrison, and yet further in view of 
Hashimoto et al. 

With respect to Claims 8 and 33, Charlesworth et al in view of Baker et al, and further in 
view of Morrison discloses the voice annotation dictionary that can be downloaded from a host 
computer as applied to Claims 6 and 32. Charlesworth et al in view of Baker et al, and further in 
view of Morrison do not specifically suggest displaying an available dictionary list, however 
Hashimoto discloses displaying such a list (Col 21, Lines 25-40). 

Charlesworth, Baker, and Morrison are analogous art because they are from a similar 
field of endeavor in speech recognition. Thus, it would have been obvious to a person of 
ordinary skill in the art, at the time of invention, to modify the teachings of Charlesworth in view 
of Baker with the vocabulary importing means disclosed by Morrison in order to enable a user to 
quickly switch to an appropriate vocabulary and suppress the recognition error rate (Hashimoto, 
Col 21, Lines 25-40). 

10. Claims 12 and 36 are rejected under 35 U.S.C. 103(a) as being unpatentable over 
Charlesworth et al in view of Baker et al, and further in view of Korall et al (U.S. Patent: 
6,996,531). 

With respect to Claims 12 and 36, Charlesworth in view of Baker discloses the voice 
annotation dictionary editor utilizing text-to-phoneme conversion, as applied to Claims 1 and 27. 
Charlesworth in view of Baker does not specifically suggest that a user is prompted is 
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normalized text cannot be generated by a parser, however Korall recites prompting a speaker if 
an input cannot be resolved into phonemes (Col 8, Lines 31-38). 

Charlesworth, Baker, and Korall are analogous art because they are from a similar field 
of endeavor in speech recognition.. Thus, it would have been obvious to a person of ordinary 
skill in the art, at the time of invention, to modify the teachings of Charlesworth in view of Baker 
with the prompting means disclosed by Korall in order to provide an effective recognition 
fallback method if an input cannot be resolved (Korall, Col 8, Lines 31-38). 

1 1 . Claims 13-16, 37-40 are rejected under 35 U.S.C. 103(a) as being unpatentable over 
Charlesworth et al in view of Baker et al, and further in view of Young et al (U.S. Patent: 
6,064,959). 

With respect to Claims 13 and 37, Charlesworth in view of Baker discloses the voice 
annotation dictionary editor utilizing text-to-phoneme conversion, as applied to Claims 1 and 27. 
Charlesworth in view of Baker do not specifically suggest speech recognition data verification, 
however Young recites a means for verifying and correcting a speech recognition word 
pronunciation (Col 21, Lines 35-61). 

Charlesworth, Baker, and Young are analogous art because they are from a similar field 
of endeavor in speech recognition. Thus, it would have been obvious to a person of ordinary 
skill in the art, at the time of invention, to modify the teachings of Charlesworth in view of Baker 
with the means for verifying and correcting a speech recognition word pronunciation as taught 
by Young in order to provide a computer-implemented means for speech recognition error 
correction (Young, Abstract). 
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With respect to Claims 14 and 38, Young further discloses correcting (modifying) a 
pronunciation corresponding to a particular word (Col 21, Lines J 1-26). 

With respect to Claims 15 and 40, Young further discloses the use of an n-best list (Col 
21 Lines 35-61). 

With respect to Claim 16, Young further discloses the use of a recognition hypothesis 
score (Col 4, Lines 34-51). 

Claim 39 contains subject matter similar to Claims 1 1 and 14, and thus, is rejected for the 
same reasons. 

12. Claims 19-22, 29, 42, and 43 are rejected under 35 U.S.C. 103(a) as being unpatentable 
over Charlesworth et al in view of Baker et al, and further in view of Sabourin et al (U.S. Patent: 
6,073,099). 

With respect to Claims 19 and 42, Charlesworth in view of Baker discloses the voice 
annotation dictionary editor utilizing text-to-phoneme conversion, as applied to Claims 1 and 29. 
Charlesworth in view of Baker do not specifically suggest identifying recognition text that is 
confusingly similar, however Sabourin recites a means for identifying a confusability cost 
between two phonetic transcriptions (recognition text) (Col 3, Line 31- Col 4, Lines 13). 

Charlesworth, Baker, and Sabourin are analogous art because they are from a similar 
field of endeavor in speech recognition. Thus, it would have been obvious to a person of 
ordinary skill in the art, at the time of invention, to modify the teachings of Charlesworth in view 
of Baker with the means for identifying a confusability cost between two phonetic transcriptions 
as taught by Sabourin in order to provide a means for automatically generating an objective 
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metric of the likelihood of confusing a spoken word with another spoken word (Sabourin, Col 1, 
Lines 31-34). 

With respect to Claims 20 and 43, Sabourin further discloses: 

Identifying the at least one other word includes calculating a measure distance between 
phonetic transcriptions associated with each recognition text, where the measure distance is 
indicative of similarity between the phonetic transcriptions (calculation of a Levinstein distance 
between two phonetic transcriptions, Col 4, Lines 15-50; and Col 1, Lines 40-51). 

With respect to Claim 21, Sabourin further recites: 

The measure distance is based on a number of edit operations needed to make the 
phonetic transcriptions identical (Levinstein distance based upon editing operations, Col 4, 
Lines 15-50). 

With respect to Claim 22, Sabourin further recites: 

Providing alternative recognition text of the desired word (replacing con/usable words 
with alternative non-confusable synonyms, Col 10, Line 60- Col 11, Line 8). 

Claim 29 contains subject matter similar to Claims 19 and 22, and thus, is rejected for the 
same reasons. 

13. Claims 23-24, 29, and 44 are rejected under 35 U.S. C. 103(a) as being unpatentable over 
Charlesworth et al in view of Baker et al, and further in view of Hirayama et al (US Patent: 
6,708,150). 

With respect to Claims 23, 29, and 44, Charlesworth in view of Baker discloses the 
voice annotation dictionary editor utilizing text-to-phoneme conversion, as applied to Claim 1 . 
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Charlesworth in view of Baker do not specifically suggest identifying an unbalanced phrase 
length, however Hirayama discloses a means for detecting phrases exceeding a specific length 
(Col 12, Line 63- Col. 13, Line 15). 

Charlesworth, Baker, and Hirayama are analogous art because they are from a similar 
field of endeavor in speech recognition. Thus, it would have been obvious to a person of 
ordinary skill in the art, at the time of invention, to modify the teachings of Charlesworth in view 
of Baker with the means for detecting phrases exceeding a specific length as taught by Hirayama 
in order to increase speech recognition reliability by utilizing shorter word alternatives 
(Hirayama, Col 12, Line 47- Col 13, Line 15). 

With respect to Claim 24, Hirayama discloses the use of shorter word alternatives as 
applied to Claim 23. 

14. Claims 25-26, 29, and 45-46 are rejected under 35 U.S.C. 103(a) as being unpatentable 
over Charlesworth et al in view of Baker et al, and further in view of Goronzy (U.S. Patent 
Publication: 2002/0111805). 

With respect to Claims 25, 29, and 45, Charlesworth in view of Baker discloses the 
voice annotation dictionary editor utilizing text-to-phoneme conversion, as applied to Claims 1 
and 27. Charlesworth in view of Baker do not specifically suggest identifying words that are 
hard to pronounce, however Goronzy recites detecting words that have difficult pronunciations 
(Paragraphs 0005 and 0053) . 

Charlesworth, Baker, and Goronzy are analogous art because they are from a similar field 
of endeavor in speech recognition. Thus, it would have been obvious to a person of ordinary 
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skill in the art, at the time of invention, to modify the teachings of Charlesworth in view of Baker 
with the means for detecting words that have difficult pronunciations as taught by Goronzy in 
order to manage the problem of decreasing recognition rates for speech in a target language 
given by a non-native speaker (Goronzy, Paragraph 0005). 

With respect to Claims 26 and 46, Goronzy further recites: 

Providing alternative recognition text of the desired word (providing pronunciation 
variants for particular words, Paragraph 0058). 

Conclusion 

15. THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time 
policy as set forth in 37 CFR 1.136(a). 

A shortened statutory period for reply to this final action is set to expire THREE 
MONTHS from the mailing date of this action. In the event a first reply is filed within TWO 
MONTHS of the mailing date of this final action and the advisory action is not mailed until after 
the end of the THREE-MONTH shortened statutory period, then the shortened statutory period 
will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 
CFR 1 .136(a) will be calculated from the mailing date of the advisory action. In no event, 
however, will the statutory period for reply expire later than SIX MONTHS from the mailing 
date of this final action. 
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16. The prior art made of record and not relied upon is considered pertinent to applicant's 
disclosure: 

Huang et al (U.S. Patent: 5,933,804)- teaches a speech recognition system that allows a 
user to edit a pronunciation string generated from a character string input. 

Shaw et al (U.S. Patent: 6,363,342)- discloses a system for developing word 
pronunciation pairs. 

Case (US. Patent Publication: 2003/0130847)- discloses a method for associating a text 
spelling with a phonetic spelling for pronunciation. 

Rajput et al (US. Patent Publication: 2004/0034524)- discloses a method for generating 
a phonetic spelling from input text for speech recognition. 

Riley et al ("Automatic Generation of Detailed Pronunciation Lexicons, " 1996)- 
discloses a system for generating lexicons containing pronunciation data. 

17. Any inquiry concerning this communication or earlier communications from the 
examiner should be directed to James S. Wozniak whose telephone number is (571) 272-7632. 
The examiner can normally be reached on M-Th, 7:30-5:00, F, 7:30-4, Off Alternate Fridays. 

If attempts to reach the examiner by telephone are unsuccessful, the examiner's 
supervisor, David Hudspeth can be reached at (571) 272-7843. The fax phone number for the 
organization where this application or proceeding is assigned is 571-273-8300. 
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Information regarding the status of an application may be obtained from the Patent 
Application Information Retrieval (PAIR) system. Status information for published applications 
may be obtained from either Private PAIR or Public PAIR. Status information for unpublished 
applications is available through Private PAIR only. For more information about the PAIR 
system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR 
system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). 



James S. Wozniak 
11/2/2006 



TECHNOLOGY CF^ Q 



